Clustering of proximal sequence space for the identification of protein families

نویسندگان

  • Federico Abascal
  • Alfonso Valencia
چکیده

MOTIVATION The study of sequence space, and the deciphering of the structure of protein families and subfamilies, has up to now been required for work in comparative genomics and for the prediction of protein function. With the emergence of structural proteomics projects, it is becoming increasingly important to be able to select protein targets for structural studies that will appropriately cover the space of protein sequences, functions and genomic distribution. These problems are the motivation for the development of methods for clustering protein sequences and building families of potentially orthologous sequences, such as those proposed here. RESULTS First we developed a clustering strategy (Ncut algorithm) capable of forming groups of related sequences by assessing their pairwise relationships. The results presented for the ras super-family of proteins are similar to those produced by other clustering methods, but without the need for clustering the full sequence space. The Ncut clusters are then used as the input to a process of reconstruction of groups with equilibrated genomic composition formed by closely-related sequences. The results of applying this technique to the data set used in the construction of the COG database are very similar to those derived by the human experts responsible for this database. AVAILABILITY The analysis of different systems, including the COG equivalent 21 genomes are available at http://www.pdg.cnb.uam.es/GenoClustering.html.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Proximal Point Algorithm for Resolvent operator in Banach Spaces

Equilibrium problems have many uses in optimization theory and convex analysis and which is why different methods are presented for solving equilibrium problems in different spaces, such as Hilbert spaces and Banach spaces. The purpose of this paper is to provide a method for obtaining a solution to the equilibrium problem in Banach spaces. In fact, we consider a hybrid proximal point algorithm...

متن کامل

Optimal coincidence best approximation solution in non-Archimedean Fuzzy Metric Spaces

In this paper, we introduce the concept of best proximal contraction theorems in non-Archimedean fuzzy metric space for two mappings and prove some proximal theorems. As a consequence, it provides the existence of an optimal approximate solution to some equations which contains no solution. The obtained results extend further the recently development proximal contractions in non-Archimedean fuz...

متن کامل

W-convergence of the proximal point algorithm in complete CAT(0) metric spaces

‎In this paper‎, ‎we generalize the proximal point algorithm to complete CAT(0) spaces and show‎ ‎that the sequence generated by the proximal point algorithm‎ $w$-converges to a zero of the maximal‎ ‎monotone operator‎. ‎Also‎, ‎we prove that if $f‎: ‎Xrightarrow‎ ‎]-infty‎, +‎infty]$ is a proper‎, ‎convex and lower semicontinuous‎ ‎function on the complete CAT(0) space $X$‎, ‎then the proximal...

متن کامل

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

Protein-Protein Interaction Analysis of Common Top Genes in Obsessive-Compulsive disorder (OCD) and Schizophrenia: Towards New Drug Approach

Comorbidty is common among psychiatric disorders including obsessive-compulsive disorder and schizophrenia with a high rate. Many studies suggested that the disorders may have same etiological bases. In this regard, shared pathways of glutamate, dopaminergic, and serotonin are the known ones. Here, the common significant genes are examined to understand the possible molecular origin of the diso...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 18 7  شماره 

صفحات  -

تاریخ انتشار 2002